Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

GH-142002: Revert GC changes to "work_to_do" logic.#142001

Draft
nascheme wants to merge 1 commit intopython:mainfrom
nascheme:gc-revert-work-to-do-change
Draft

GH-142002: Revert GC changes to "work_to_do" logic.#142001
nascheme wants to merge 1 commit intopython:mainfrom
nascheme:gc-revert-work-to-do-change

Conversation

@nascheme
Copy link
Member

@nascheme nascheme commented Nov 27, 2025

This reverts parts of the GH-140262 change. The changes that affect the tuple untracking are left unchanged. Revert the changes to the calculation of the increment size, based on the "work_to_do" variable. This causes cyclic garbage to be collected more quickly. Revert also the change to test_gc.py, which was done because the expected GC collection was taking longer to happen.

With the tuple untrack change, the performance regression as reported by bug GH-139951 is still resolved (work_to_do changes are not required).

One way to test the effect of this change is to run the following quick-and-dirty script. It creates many reference cycles and then counts how many are not yet collected by the GC. At the end of the run, the maximum number of outstanding garbage objects are printed. With Python 3.13 (non-incremental GC), I get a max of less than 100 objects. With 3.14.0, I get approximately 2000 as the max. With the "main" branch (and 3.14 branch), I get 10,000 as the max.

gc_report_cycles.py

This reverts parts of the pythonGH-140262 change.  The changes that affect the
tuple untracking are left unchanged.  Revert the changes to the
calculation of the increment size, based on the "work_to_do" variable.
This causes cyclic garbage to be collected more quickly.  Revert also
the change to test_gc.py, which was done because the expected GC
collection was taking longer to happen.

With the tuple untrack change, the performance regression as reported by
bug pythonGH-139951 is still resolved (work_to_do changes are not required).
@nascheme
Copy link
Member Author

nascheme commented Nov 27, 2025

@sergey-miryanov given your recent good work on GC related things, you might want to take a look at this. This PR is an attempt at a quick fix, making the code work like the 3.14.0 release in this respect. However, for 3.15 I'm thinking we need to have a more careful look at how the GC increment size is computed. I'm going to create another PR that adds a "mark-alive" pass to the full GC run. That gives about a 2X performance improvement for full (manual) GC runs.

@nascheme nascheme changed the title Revert GC changes to "work_to_do" logic. GH-142002: Revert GC changes to "work_to_do" logic. Nov 27, 2025
@nascheme nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 27, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @nascheme for commit fef3eed 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F142001%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 27, 2025
@sergey-miryanov
Copy link
Contributor

sergey-miryanov commented Nov 27, 2025

@nascheme Thanks! I will take a look.
I took some measurements in September. Maybe you'll find them interesting - faster-cpython/ideas#668 (comment)

I'm comparing 3.13 and main and have following preliminary results (I have added heap_size to gcstate in 3.13):
3.13

image

main (and I suppose this will be the same for 3.14)

image

This is for benchmarks\bm_xml_etree\run_benchmark.py --worker --pipe 556 --worker-task=0 --values 3 --min-time 0.1 --loops 1 --warmups 1 --etree-module xml.etree.ElementTree run.

@sergey-miryanov
Copy link
Contributor

sergey-miryanov commented Nov 27, 2025

And another couple of screenshots that I made (they made on 80e59a8)

image image image

I'm not sure that work_to_do calculates properly at all. Maybe fixing #127519 is a key for this problem.

@gpshead
Copy link
Member

gpshead commented Nov 28, 2025

If this lands, it'll need a backport because #140447 landed in 3.14.

@gpshead gpshead added needs backport to 3.14 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 28, 2025
@markshannon
Copy link
Member

@nascheme

gc_report_cycles.py produces tiny cycles that would never survive the first generation in the old GC. That doesn't seem very realistic.
Also, why the explicit call to gc.collect()?

We already have https://github.com/python/cpython/blob/main/Lib/test/_test_gc_fast_cycles.py to test that the amount of garbage stays within reasonable bounds, but it does produce larger, longer lived cycles.
Should we add another test that produces more, smaller cycles?

@markshannon
Copy link
Member

@sergey-miryanov
Nice graphs.

I'm not sure that work_to_do calculates properly at all.

It might be worth revisiting the calculation.
A couple of notes:

  • work_to_do should go negative, either because we've done a big chunk of work scanning the stacks or because the last increment was big, but I'm surprised how negative it goes.
  • In general, we want to be quite lazy when the heap is small, to avoid negatively impacting startup.

@nascheme
Copy link
Member Author

nascheme commented Dec 2, 2025

gc_report_cycles.py produces tiny cycles that would never survive the first generation in the old GC. That doesn't seem very realistic.

I don't think whether they are tiny or not is very relevant. If this cyclic garbage is holding on to a lot of memory, the increase in memory usage due to the GH-140262 change would be much worse.

IMO, we should be careful about changing how quickly the cyclic GC frees cyclic garbage in a bugfix release. At least, there should be some discussion and justification on why changing it is okay. GH-140262 doesn't even mention this change, let alone justify it. It changes the unit tests to make the expected iterations before collection 10x higher. Is that okay? I don't know.

Also, why the explicit call to gc.collect()?

Could be removed. It was a quick and dirty test adapted from a different test. A more realistic test (perhaps) would be to continuously create new objects, with a fraction of them being cyclic garbage. Then we could track peak memory usage and get a better idea of how these work_to_do() changes are affecting that.

We already have https://github.com/python/cpython/blob/main/Lib/test/_test_gc_fast_cycles.py to test that the amount of garbage stays within reasonable bounds, but it does produce larger, longer lived cycles. Should we add another test that produces more, smaller cycles?

The fact that we had to change that test to run in a subprocess, in order to make it more reliable, is not a good sign. We don't have unit tests that simulate a program that allocates a large number of objects (e.g. a couple million, at least) and creates some significant fraction of garbage cycles (say 10% of allocations).

@markshannon
Copy link
Member

A more realistic test (perhaps) would be to continuously create new objects, with a fraction of them being cyclic garbage.

More tests are always good. Maybe we could parameterize _test_gc_fast_cycles.py to do that.

Then we could track peak memory usage and get a better idea of how these work_to_do() changes are affecting that.

I'd love to base our heuristics on memory usage, rather than object count, but we really don't have enough control over the allocator to do that. Maybe in the future.

The fact that we had to change that test to run in a subprocess, in order to make it more reliable, is not a good sign.

It was moved to make it more predictable. It means we can use lower thresholds giving a stricter test, as we don't need to account for the worse case heap left over by previous tests.

@sergey-miryanov
Copy link
Contributor

Working on #142516 I got following:

I took @nascheme script that creates SSL context and stores it in the instances with cycles and did some measurements.

For 3.14.0, the script uses less memory - it grows up in few steps and stabilizes at 104.3265. GC runs after the following iterations: 700, 1600, 2600, 3600, 4600, 5600, 6600 and 7600. Each time the garbage collector runs, it successfully clears cycles and SSL contexts. I suspect the memory grows because we don't return it all at once to the OS. However, since we clear all previously allocated SSL contexts, subsequent calls reuse that memory.

For 3.14.1+, the script uses about four times more memory, which grows and then stabilizes after the first gc.collect() at 488.934. GC runs after the following iterations: 700, 1700, 2700, 3700, 4600, 5600, 6600 and 7600. However, it only performs real work after 4600 iteration. This is because work_to_do is less than 0, and gc_collect_increment does early return on this check. GC runs eight times but do real work only once. Since we "postpone" garbage collection, the memory used by the SSL context is higher. All subsequent SSL context creations use that memory, which is why it no longer grows after 488.

Full results here - https://github.com/sergey-miryanov/ssl-context-size

I believe we should apply this patch until we get another schema for incremental GC.

@sergey-miryanov
Copy link
Contributor

There is a reproducer for #142516 without ssl module:

# ssl_context_size.py

import gc

def get_rss():
    import psutil

    p = psutil.Process()
    mem_info = p.memory_info()
    rss = mem_info.rss
    return rss / 1024.0

class Cell:
    def __init__(self, ctx):
        self.ctx = ctx
        self.self = self

def main():
    gc.collect(2)
    rss = get_rss()
    N = 16_000
    for i in range(N):
        Cell(b'bb'*(1024*256))
        if i % 100 == 0:
            rss2 = get_rss()
            size = (rss2 - rss)
            print('idx', i, f'size {size:_} kB')

if __name__ == '__main__':
    main()

On main it grows to 5_687_720.0 kB. With this fix max value is 985_728.0 kB

@cpoppema
Copy link

cpoppema commented Mar 18, 2026

There is a reproducer for #142516 without ssl module:

# ssl_context_size.py

On main it grows to 5_687_720.0 kB. With this fix max value is 985_728.0 kB

I love this producer. I went ahead and ran it for my problem case as well:

  • prometheus_client
  • not using ssl
#!/usr/bin/env python
import gc
import socket

from prometheus_client import CollectorRegistry, Gauge, pushadd_to_gateway


REGISTRY = CollectorRegistry(auto_describe=True)
HOST = socket.gethostname()
LAST_RUN = Gauge("last_run_time", "Last time job has run.", registry=REGISTRY)


def push_metrics(job, registry):
    pushadd_to_gateway("http://pushgateway:9091", job=job, registry=registry, grouping_key={"instance": HOST})


def get_rss():
    import psutil

    p = psutil.Process()
    mem_info = p.memory_info()
    rss = mem_info.rss
    return rss / 1024.0


def main():
    gc.collect(2)
    rss = get_rss()
    N = 5_000
    for i in range(N):
        push_metrics("background_tasks", REGISTRY)
        if i % 100 == 0:
            rss2 = get_rss()
            size = rss2 - rss
            print(f"idx {i:>5} size {size:>10_} kB")


if __name__ == "__main__":
    main()

With 25+ jobs running on a 16 GB staging machine that is reset at least every week, I've never ran out of RAM up until 3.14(.3) and here you can see why.

Note that every time pushadd_to_gateway is called without handler arg, the prometheus_client library creates a new handler function.

python-3.14.3 from the snippet:

idx     0 size    1_708.0 kB
idx   100 size   84_472.0 kB
idx   200 size  167_424.0 kB
idx   300 size  250_360.0 kB
idx   400 size  333_296.0 kB
idx   500 size  416_232.0 kB
idx   600 size  499_168.0 kB
idx   700 size  568_052.0 kB
idx   800 size  568_080.0 kB
idx   900 size  568_140.0 kB
idx  1000 size  568_376.0 kB
idx  1100 size  568_376.0 kB
idx  1200 size  568_376.0 kB
idx  1300 size  568_376.0 kB
idx  1400 size  568_376.0 kB
idx  1500 size  568_376.0 kB
idx  1600 size  568_376.0 kB
idx  1700 size  568_376.0 kB
idx  1800 size  568_376.0 kB
idx  1900 size  568_376.0 kB
idx  2000 size  568_376.0 kB
idx  2100 size  568_376.0 kB
idx  2200 size  568_376.0 kB
idx  2300 size  568_376.0 kB
idx  2400 size  568_376.0 kB
idx  2500 size  568_376.0 kB
idx  2600 size  568_376.0 kB
idx  2700 size  568_376.0 kB
idx  2800 size  568_376.0 kB
idx  2900 size  568_376.0 kB
idx  3000 size  568_376.0 kB
idx  3100 size  568_376.0 kB
idx  3200 size  568_376.0 kB
idx  3300 size  568_376.0 kB
idx  3400 size  568_376.0 kB
idx  3500 size  568_376.0 kB
idx  3600 size  568_376.0 kB
idx  3700 size  568_376.0 kB
idx  3800 size  568_376.0 kB
idx  3900 size  568_376.0 kB
idx  4000 size  568_376.0 kB
idx  4100 size  568_376.0 kB
idx  4200 size  568_376.0 kB
idx  4300 size  568_376.0 kB
idx  4400 size  568_376.0 kB
idx  4500 size  568_376.0 kB
idx  4600 size  568_376.0 kB
idx  4700 size  568_376.0 kB
idx  4800 size  568_376.0 kB
idx  4900 size  568_376.0 kB

python-3.13.12 from the snippet:

idx     0 size    2_408.0 kB
idx   100 size   42_872.0 kB
idx   200 size   44_548.0 kB
idx   300 size   46_208.0 kB
idx   400 size   47_888.0 kB
idx   500 size   49_540.0 kB
idx   600 size   49_556.0 kB
idx   700 size   49_556.0 kB
idx   800 size   49_556.0 kB
idx   900 size   49_560.0 kB
idx  1000 size   50_372.0 kB
idx  1100 size   52_024.0 kB
idx  1200 size   52_024.0 kB
idx  1300 size   52_024.0 kB
idx  1400 size   52_024.0 kB
idx  1500 size   52_024.0 kB
idx  1600 size   52_024.0 kB
idx  1700 size   52_856.0 kB
idx  1800 size   52_856.0 kB
idx  1900 size   52_856.0 kB
idx  2000 size   52_856.0 kB
idx  2100 size   52_856.0 kB
idx  2200 size   52_856.0 kB
idx  2300 size   53_684.0 kB
idx  2400 size   53_684.0 kB
idx  2500 size   53_684.0 kB
idx  2600 size   53_684.0 kB
idx  2700 size   53_684.0 kB
idx  2800 size   53_684.0 kB
idx  2900 size   54_512.0 kB
idx  3000 size   54_512.0 kB
idx  3100 size   54_512.0 kB
idx  3200 size   54_512.0 kB
idx  3300 size   54_512.0 kB
idx  3400 size   54_512.0 kB
idx  3500 size   55_340.0 kB
idx  3600 size   55_340.0 kB
idx  3700 size   55_340.0 kB
idx  3800 size   55_340.0 kB
idx  3900 size   55_340.0 kB
idx  4000 size   55_340.0 kB
idx  4100 size   56_172.0 kB
idx  4200 size   56_172.0 kB
idx  4300 size   56_172.0 kB
idx  4400 size   56_172.0 kB
idx  4500 size   56_172.0 kB
idx  4600 size   56_172.0 kB
idx  4700 size   57_000.0 kB
idx  4800 size   57_000.0 kB
idx  4900 size   57_000.0 kB

My current workaround is to call gc.collect() right after pushadd_to_gateway() and that also has dramatic benefits (remember this times 25+ jobs):

python-3.14.3 after adding gc.collect() after pushadd_to_gateway:

idx     0 size    1_872.0 kB
idx   100 size    1_920.0 kB
idx   200 size    1_928.0 kB
idx   300 size    1_932.0 kB
idx   400 size    1_936.0 kB
idx   500 size    1_936.0 kB
idx   600 size    1_924.0 kB
idx   700 size    1_932.0 kB
idx   800 size    1_932.0 kB
idx   900 size    1_932.0 kB
idx  1000 size    1_932.0 kB
idx  1100 size    1_932.0 kB
idx  1200 size    1_932.0 kB
idx  1300 size    1_932.0 kB
idx  1400 size    1_936.0 kB
idx  1500 size    1_936.0 kB
idx  1600 size    1_936.0 kB
idx  1700 size    1_936.0 kB
idx  1800 size    1_936.0 kB
idx  1900 size    1_936.0 kB
idx  2000 size    1_936.0 kB
idx  2100 size    1_936.0 kB
idx  2200 size    1_936.0 kB
idx  2300 size    1_936.0 kB
idx  2400 size    1_936.0 kB
idx  2500 size    1_924.0 kB
idx  2600 size    1_924.0 kB
idx  2700 size    1_924.0 kB
idx  2800 size    1_924.0 kB
idx  2900 size    1_924.0 kB
idx  3000 size    1_924.0 kB
idx  3100 size    1_924.0 kB
idx  3200 size    1_924.0 kB
idx  3300 size    1_928.0 kB
idx  3400 size    1_932.0 kB
idx  3500 size    1_932.0 kB
idx  3600 size    1_932.0 kB
idx  3700 size    1_936.0 kB
idx  3800 size    1_936.0 kB
idx  3900 size    1_936.0 kB
idx  4000 size    1_940.0 kB
idx  4100 size    1_940.0 kB
idx  4200 size    1_940.0 kB
idx  4300 size    1_940.0 kB
idx  4400 size    1_940.0 kB
idx  4500 size    1_940.0 kB
idx  4600 size    1_940.0 kB
idx  4700 size    1_940.0 kB
idx  4800 size    1_940.0 kB
idx  4900 size    1_940.0 kB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.14 bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants